Integrating Diverse Datasets Improves Developmental Enhancer Prediction
نویسندگان
چکیده
Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology.
منابع مشابه
DELTA: A Distal Enhancer Locating Tool Based on AdaBoost Algorithm and Shape Features of Chromatin Modifications
Accurate identification of DNA regulatory elements becomes an urgent need in the post-genomic era. Recent genome-wide chromatin states mapping efforts revealed that DNA elements are associated with characteristic chromatin modification signatures, based on which several approaches have been developed to predict transcriptional enhancers. However, their practical application is limited by incomp...
متن کاملModelling complex features from histone modification signatures using genetic algorithm for the prediction of enhancer region.
Using Genetic Algorithm, this paper presents a modelling method to generate novel logical-based features from DNA sequences enriched with H3K4mel histone signatures. Current histone signature is mostly represented using k-mers content features incapable of representing all the possible complex interactions of various DNA segments. The main contributions are, among others: (a) demonstrating that...
متن کاملLimb-Enhancer Genie: An accessible resource of accurate enhancer predictions in the developing limb
Epigenomic mapping of enhancer-associated chromatin modifications facilitates the genome-wide discovery of tissue-specific enhancers in vivo. However, reliance on single chromatin marks leads to high rates of false-positive predictions. More sophisticated, integrative methods have been described, but commonly suffer from limited accessibility to the resulting predictions and reduced biological ...
متن کاملAnalysis and prediction of super-enhancers using sequence and chromatin signatures
Background: Super-enhancers are clusters of active enhancers densely occupied by the Mediators, transcription factors and chromatin regulators, control expression of cell identity and disease associated genes. Current studies demonstrated the possibility of multiple factors with important roles in super-enhancer formation; however, a systematic analysis to asses the relative contribution of chr...
متن کاملComplex organization of promoter and enhancer elements regulate the tissue- and developmental stage-specific expression of the Drosophila melanogaster Gld gene.
The Drosophila melanogaster Gld gene has multiple and diverse developmental and physiological functions. We report herein that interactions among proximal promoter elements and a cluster of intronically located enhancers and silencers specify the complex regulation of Gld that underlies its diverse functions. Gld expression in nonreproductive tissues is largely determined by proximal promoter e...
متن کامل